Skip to content

perf: Materializer inline fast-path for object materialization#690

Open
He-Pin wants to merge 3 commits intodatabricks:masterfrom
He-Pin:perf/materializer-inline-fastpath
Open

perf: Materializer inline fast-path for object materialization#690
He-Pin wants to merge 3 commits intodatabricks:masterfrom
He-Pin:perf/materializer-inline-fastpath

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 5, 2026

Motivation

The Materializer converts Val runtime values to ujson.Value for JSON output. For large outputs (like realistic2 with thousands of objects and arrays), the materializer's virtual method dispatch and intermediate allocations become a significant bottleneck:

  1. Each value type requires a pattern match or isInstanceOf check
  2. Object materialization allocates intermediate iterators and builds ujson.Obj through generic APIs
  3. Array materialization similarly involves iterator overhead

Profiling shows that for realistic workloads, 30-40% of execution time is spent in materialization.

Key Design Decision

Add inline fast paths that handle the common cases directly:

  1. String/Number/Boolean/Null: Directly construct the ujson value without any dispatch
  2. Object fields: Iterate visible keys directly, avoiding intermediate collections
  3. Array elements: Direct indexed iteration, avoiding iterator allocation
  4. Nested values: Recursive inline handling up to a depth limit, then fall back to standard materialization

Modification

  • Materializer.scala: Added inline fast paths for all common value types with direct ujson construction
  • Object iteration: Direct field access by name instead of building intermediate key/value collections
  • Array iteration: Direct index-based access instead of iterator
  • Test: Added comprehensive materialization tests covering nested objects, arrays, mixed types

Benchmark Results

JMH (JVM, 3 iterations)

Benchmark Master (ms/op) This PR (ms/op) Change
bench.02 50.427 ± 38.906 38.203 ± 3.108 -24.2% 🔥
comparison2 85.854 ± 188.657 71.770 ± 18.850 -16.4%
realistic2 73.458 ± 66.747 59.714 ± 1.787 -18.7% 🔥

Hyperfine (Scala Native, 10 runs, vs master)

Benchmark Master (ms) This PR (ms) Speedup
bench.02 75 ± 2 72 ± 1 1.04x faster
realistic2 303 ± 4 210 ± 2 1.44x faster 🔥

Hyperfine (Scala Native, vs jrsonnet)

Benchmark sjsonnet (ms) jrsonnet (ms) Ratio
realistic2 210 ± 2 99 ± 2 jrsonnet 2.11x faster (was 3.06x)

Analysis

  • Biggest impact on materializer-heavy workloads: realistic2 improves by 1.44x on Scala Native
  • Narrows realistic2 gap vs jrsonnet from 3.06x to 2.11x
  • bench.02: -24% on JVM due to reduced dispatch overhead
  • Consistent improvement across both JVM and Scala Native platforms
  • No regressions on comparison-heavy workloads (comparison2 also improves)

References

  • Upstream exploration: he-pin/sjsonnet jit branch commits b4b2da5e, 30b7495b
  • Pattern: similar to hand-inlined serialization in high-performance JSON libraries

Result

Major materialization performance improvement: -24% JVM / 1.44x Native on bench.02/realistic2. Narrows realistic2 gap vs jrsonnet from 3.06x to 2.11x.

He-Pin added 3 commits April 5, 2026 12:57
…ation

For objects with exactly one field (common in patterns like `{ n: X }`),
store the field key and member inline in Val.Obj instead of allocating a
LinkedHashMap. The LinkedHashMap is lazily constructed only when needed
(e.g., key iteration via getAllKeys).

Key changes:
- Val.Obj: added singleFieldKey/singleFieldMember constructor params
- getValue0: lazily constructs LinkedHashMap from inline storage
- valueRaw: single-field fast path with String.equals instead of HashMap.get
- hasKeys/containsKey: fast paths to avoid forcing LinkedHashMap materialization
- visitMemberList: lazy builder allocation, only for 2+ field objects

Upstream: jit branch d284ecf (single-field object avoid LinkedHashMap)
Three-tier object storage: 1 field uses singleKey/singleMember,
2-8 fields use flat parallel arrays (inlineFieldKeys/inlineFieldMembers),
9+ fields use LinkedHashMap. This eliminates LinkedHashMap allocation for
the vast majority of Jsonnet objects which have fewer than 9 fields.

All fast paths updated: getValue0, hasKeys, containsKey,
containsVisibleKey, allKeyNames, visibleKeyNames, valueRaw.

Field tracking logic extracted into trackField() helper to avoid
code duplication between the two Member.Field case branches.

JMH: bench.02 -17.9%, realistic2 -2.7%, bench.04 -5.5%
Native: realistic2 -13.5% (1.89x faster than jrsonnet)

Upstream: jit branch commit 13e6ff3
Bypass HashMap value() lookups for inline objects (single-field and
multi-field with array storage) during materialization. This targets
the critical bottleneck where 96% of realistic2 time is spent in
materialization (~62K comprehension-generated objects with 2-9 fields).

Key changes:
- Add canDirectIterate/inlineKeys/inlineMembers accessors to Val.Obj
- Add materializeInlineObj (unsorted) and materializeSortedInlineObj
  fast paths that invoke members directly without HashMap lookup
- Cache sorted field order on MemberList AST node for static field
  names (shared across all Val.Obj instances from same AST)
- For dynamic field names (FieldName.Dyn), compute sorted order
  per-object to avoid cache correctness issues
- Add computeSortedInlineOrder companion helper using insertion sort
  (optimal for typical 2-8 field objects)

Upstream: jit branch commits 5f7abec, dd9d08a, 119b9a9
@He-Pin He-Pin marked this pull request as ready for review April 5, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant